An Efficient Method for Finding Closed Subspace Clusters for High Dimensional Data

نویسندگان

  • S. Anuradha
  • Jaya Lakshmi
چکیده

Subspace clustering tries to find groups of similar objects from the given dataset such that the objects are projected on only a subset of the feature space. It finds meaningful clusters in all possible subspaces. However, when it comes to the quality of the resultant subspace clusters most of the subspace clusters are redundant. These redundant subspace clusters don’t provide new information. Hence there is a need for eliminating such redundant subspace clusters and output only those subspace clusters which are non redundant and each of them contributing some unique information to the data miner. The set of non redundant subspace clusters is helpful for easy analysis. In order to accomplish this, the concept of closedness has been applied to the subspace clusters. An algorithm known as Finding Closed Subspace Clusters (FCSC) is presented which efficiently outputs the closed subspace clusters from a given set of subspace clusters produced from any subspace clustering algorithm. Based on the experimental study conducted, the number of clusters generated by FCSC has been greatly reduced when compared to the existing SUBCLU algorithm and the average purity of the clusters is marginally improved without loss of coverage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding and Visualizing Subspace Clusters of High Dimensional Dataset Using Advanced Star Coordinates

Analysis of high dimensional data is a research area since many years. Analysts can detect similarity of data points within a cluster. Subspace clustering detects useful dimensions in clustering high dimensional dataset. Visualization allows a better insight of subspace clusters. However, displaying such high dimensional database clusters on the 2-dimensional display is a challenging task. We p...

متن کامل

Hierarchical Subspace Clustering

It is well-known that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to high-dimensional data. This poor behavior is based on the fact that clusters may not be found in the high-dimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of tra...

متن کامل

Subspace Clustering for Uncertain Data

Analyzing uncertain databases is a challenge in data mining research. Usually, data mining methods rely on precise values. In scenarios where uncertain values occur, e.g. due to noisy sensor readings, these algorithms cannot deliver highquality patterns. Beside uncertainty, data mining methods face another problem: high dimensional data. For finding object groupings with locally relevant dimens...

متن کامل

An Efficient Density Conscious Subspace Clustering Method using Top-down and Bottom-up Strategies

Clustering high dimensional data is an emerging research field. Most clustering technique use distance measures to build clusters. In high dimensional spaces, traditional clustering algorithms suffers from a problem called “curse of dimensionality”. Subspace clustering groups similar objects embedded in subspace of full space. Recent approaches attempt to find clusters embedded in subspace of h...

متن کامل

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

Finding clusters in high dimensional data is a challenging task as the high dimensional data comprises hundreds of attributes. Subspace clustering is an evolving methodology which, instead of finding clusters in the entire feature space, it aims at finding clusters in various overlapping or non-overlapping subspaces of the high dimensional dataset. Density based subspace clustering algorithms t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016